Building Thesaurus from Manual Sources and Automatic Scanned Texts

نویسنده

  • Jean-Pierre Chevallet
چکیده

This paper describes the work done in the TIPS project about the construction of a thesaurus base. This construction is a merge from a thesaurus manually built and one automatically extracted from large text corpora. Several manually built thesaurus have been semiformatted to be merged in a consistent common base. The automatic extraction is based on both syntax and statistics. We present in this paper the way thesaurus are built and the results on Scienti c corpus in the context of the TIPS project.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual Ontology Enrichment for Semantic Annotation and Retrieval of Medical Information

Background: Knowledge management in the European project Noesis addresses concept-based annotation and multilingual Information Retrieval of documents. Objective: Multilingual enrichment of a concept-based terminology in the medical field. Experience and evaluation in the domain of cardiovascular diseases by enriching a subset of the MeSH thesaurus in six European languages. This terminology, r...

متن کامل

Methodology For Building Thematic Indexes In Medicine For French

The aim of this project is to propose a methodology in automatically building thematic index from French medical texts in order to improve the IR process. In this article, we focus on the selection process of relevant terms. Contrary to Bourigault and Charlet (1999) who defend a statistical method followed by human intervention, we propose an automatic method that takes advantage of available a...

متن کامل

Conceptual Business Process Structuring by Extracting Knowledge from Natural Language Texts

This article discusses methods of constructing a formalized structure of a subject domain based on analysis of natural language texts, including discovering objects, their properties and related actions, followed by discovering business processes specific to the subject domain and the formation of thesaurus and business processes of the subject domain. At the same time the thesaurus can be chan...

متن کامل

Construction of Thematic Representations of Texts Based on Domain-Specific Thesaurus

The paper considers interrelations between lexical cohesion and the thematic structure of a text. The technique of automatic construction of the thematic representation of the text contexts is described. The technique uses knowledge from Sociopolitical thesaurus, which was specially developed as a tool for automatic text processing.

متن کامل

MeSH Up: effective MeSH text classification for improved document retrieval

MOTIVATION Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002